Repetitions in strings: Algorithms and combinatorics
نویسندگان
چکیده
The article is an overview of basic issues related to repetitions in strings, concentrating on algorithmic and combinatorial aspects. This area is important both from theoretical and practical point of view. Repetitions are highly periodic factors (substrings) in strings and are related to periodicities, regularities, and compression. The repetitive structure of strings leads to higher compression rates, and conversely, some compression techniques are at the core of fast algorithms for detecting repetitions. There are several types of repetitions in strings: squares, cubes, and maximal repetitions also called runs. For these repetitions, we distinguish between the factors (sometimes qualified as distinct) and their occurrences (also called positioned factors). The combinatorics of repetitions is a very intricate area, full of open problems. For example we know that the number of (distinct) primitively-rooted squares in a string of length n is no more than 2n−Θ(log n), conjecture to be n, and that their number of occurrences can be Θ(n log n). Similarly we know that there are at most 1.029n and at least 0.944n maximal repetitions and the conjecture is again that the exact bound is n. We know almost everything about the repetitions in Sturmian words, but despite the simplicity of these words, the results are nontrivial. One of the main motivations for writing this text is the development during the last couple of years of new techniques and results about repetitions. We report both the progress which has been achieved and which we expect to happen.
منابع مشابه
Combinatorics on Words
Words (strings of symbols) are fundamental in computer processing. Indeed, each bit of data processed by a computer is a string, and nearly all computer software use algorithms on strings. There are also abundant supply of applications of these algorithms in other areas such as data compression, DNA sequence analysis, computer graphics, cryptography, and so on. Combinatorics on words belongs to...
متن کاملApproximate Cover of Strings
Regularities in strings arise in various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. A common notion to describe regularity in a string T is a cover, which is a string C for which every letter of T lies within some occurrence of C. The alignment of the cover repetitions in the given text is called a tiling. In...
متن کاملTwo-pattern strings II - frequency of occurrence and substring complexity
The two previous papers in this series introduced a class of infinite binary strings, called two-pattern strings, that constitute a significant generalization of, and include, the much-studied Sturmian strings. The class of two-pattern strings is a union of a sequence of increasing (with respect to inclusion) subclasses TPλ of two-pattern strings of scope λ, λ = 1, 2, · · · . Prefixes of two-pa...
متن کاملLarge-scale detection of repetitions.
Combinatorics on words began more than a century ago with a demonstration that an infinitely long string with no repetitions could be constructed on an alphabet of only three letters. Computing all the repetitions (such as ∙∙∙TTT ∙∙∙ or ∙∙∙ CGACGA ∙∙∙ ) in a given string x of length n is one of the oldest and most important problems of computational stringology, requiring time in the worst case...
متن کاملLossless filter for multiple repetitions with Hamming distance
Similarity search in texts, notably in biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the solution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two strings or occurring twice in the same string. In this pa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Theor. Comput. Sci.
دوره 410 شماره
صفحات -
تاریخ انتشار 2009